AITopics | achille and soatto

Collaborating Authors

achille and soatto

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Role of Mutual Information in Variational Classifiers

Vera, Matias, Vega, Leonardo Rey, Piantanida, Pablo

arXiv.org Machine LearningOct-22-2020

Overfitting data is a well-known phenomenon related with the generation of a model that mimics too closely (or exactly) a particular instance of data, and may therefore fail to predict future observations reliably. In practice, this behaviour is controlled by various--sometimes heuristics--regularization techniques, which are motivated by developing upper bounds to the generalization error. In this work, we study the generalization error of classifiers relying on stochastic encodings trained on the cross-entropy loss, which is often used in deep learning for classification problems. We derive bounds to the generalization error showing that there exists a regime where the generalization error is bounded by the mutual information between input features and the corresponding representations in the latent space, which are randomly generated according to the encoding distribution. Our bounds provide an information-theoretic understanding of generalization in the so-called class of variational classifiers, which are regularized by a Kullback-Leibler (KL) divergence term. These results give theoretical grounds for the highly popular KL term in variational inference methods that was already recognized to act effectively as a regularization penalty. We further observe connections with well studied notions such as Variational Autoencoders, Information Dropout, Information Bottleneck and Boltzmann Machines. Finally, we perform numerical experiments on MNIST and CIFAR datasets and show that mutual information is indeed highly representative of the behaviour of the generalization error.

artificial intelligence, generalization error, machine learning, (18 more...)

arXiv.org Machine Learning

2010.11642

Country:

North America > Canada > Ontario > Toronto (0.14)
South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
Europe > France (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Discovery and Separation of Features for Invariant Representation Learning

Jaiswal, Ayush, Brekelmans, Rob, Moyer, Daniel, Steeg, Greg Ver, AbdAlmageed, Wael, Natarajan, Premkumar

arXiv.org Machine LearningDec-2-2019

Supervised machine learning models often associate irrelevant nuisance factors with the prediction target, which hurts generalization. We propose a framework for training robust neural networks that induces invariance to nuisances through learning to discover and separate predictive and nuisance factors of data. We present an information theoretic formulation of our approach, from which we derive training objectives and its connections with previous methods. Empirical results on a wide array of datasets show that the proposed framework achieves state-of-the-art performance, without requiring nuisance annotations during training.

achille and soatto, information, objective, (15 more...)

arXiv.org Machine Learning

1912.00646

Country:

North America > United States > California (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Information in Infinite Ensembles of Infinitely-Wide Neural Networks

Shwartz-Ziv, Ravid, Alemi, Alexander A.

arXiv.org Machine LearningNov-23-2019

One promising research direction is to view deep neural networks through the lens of information theory (Tishby and Zaslavsky, 2015). Abstractly, deep connections exist between the information a learning algorithm extracts and its generalization capabilities (Bassily et al., 2017; Banerjee, 2006). Inspired by these general results, recent papers have attempted to measure information-theoretic quantities in ordinary deterministic neural networks (Shwartz-Ziv and Tishby, 2017; Achille and Soatto, 2017; Achille and Soatto, 2019). Both practical and theoretical problems arise in the deterministic case (Amjad and Geiger, 2018; Saxe et al., 2018; Kolchinsky et al., 2018). These difficulties stem from the fact that mutual information (MI) is reparameterization independent (Cover and Thomas, 2012). 1 One workaround is to make a network explicitly stochastic, either in its activations (Alemi et al., 2016) or its weights (Achille and Soatto, 2017).

ensemble, generalization, neural network, (14 more...)

arXiv.org Machine Learning

1911.09189

Country: Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Class-Conditional Compression and Disentanglement: Bridging the Gap between Neural Networks and Naive Bayes Classifiers

Amjad, Rana Ali, Geiger, Bernhard C.

arXiv.org Machine LearningJun-6-2019

In this draft, which reports on work in progress, we 1) adapt the information bottleneck functional by replacing the compression term by class-conditional compression, 2) relax this functional using a variational bound related to class-conditional disentanglement, 3) consider this functional as a training objective for stochastic neural networks, and 4) show that the latent representations are learned such that they can be used in a naive Bayes classifier. We continue by suggesting a series of experiments along the lines of Nonlinear Information Bottleneck [Kolchinsky et al., 2018], Deep Variational Information Bottleneck [Alemi et al., 2017], and Information Dropout [Achille and Soatto, 2018]. We furthermore suggest a neural network where the decoder architecture is a parameterized naive Bayes decoder. We consider a classification task with a feature random variable (RV) X on R and a class RV Y on the finite set Y of classes. We further consider stochastic feed-forward neural networks (NNs).

artificial intelligence, machine learning, representation, (10 more...)

arXiv.org Machine Learning

1906.02576

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Understanding the Behaviour of the Empirical Cross-Entropy Beyond the Training Distribution

Vera, Matias, Piantanida, Pablo, Vega, Leonardo Rey

arXiv.org Machine LearningMay-28-2019

Machine learning theory has mostly focused on generalization to samples from the same distribution as the training data. Whereas a better understanding of generalization beyond the training distribution where the observed distribution changes is also fundamentally important to achieve a more powerful form of generalization. In this paper, we attempt to study through the lens of information measures how a particular architecture behaves when the true probability law of the samples is potentially different at training and testing times. Our main result is that the testing gap between the empirical cross-entropy and its statistical expectation (measured with respect to the testing probability law) can be bounded with high probability by the mutual information between the input testing samples and the corresponding representations, generated by the encoder obtained at training time. These results of theoretical nature are supported by numerical simulations showing that the mentioned mutual information is representative of the testing gap, capturing qualitatively the dynamic in terms of the hyperparameters of the network.

artificial intelligence, machine learning, mutual information, (19 more...)

arXiv.org Machine Learning

1905.11972

Country:

North America > Canada > Ontario > Toronto (0.14)
South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
North America > United States > Washington > King County > Bellevue (0.04)
(5 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback